9 research outputs found
Statistical Model Evaluation Using Reproducing Kernels and Stein’s method
Advances in computing have enabled us to develop increasingly complex statistical models. However, their complexity poses challenges in their evaluation. The central theme of the thesis is addressing intractability and interpretability in model evaluations. The key tools considered in the thesis are kernel and Stein's methods: Kernel methods provide flexible means of specifying features for comparing models, and Stein's method further allows us to incorporate model structures in evaluation.
The first part of the thesis addresses the question of intractability. The focus is on latent variable models, a large class of models used in practice, including factor models, topic models for text, and hidden Markov models. The kernel Stein discrepancy (KSD), a kernel-based discrepancy, is extended to deal with this model class. Based on this extension, a statistical hypothesis test of relative goodness of fit is developed, enabling us to compare competing latent variable models that are known up to normalization.
The second part of the thesis concerns the question of interpretability with two contributed works. First, interpretable relative goodness-of-fit tests are developed using kernel-based discrepancies developed in Chwialkowski et al. (2015); Jitkrittum et al. (2016); Jitkrittum et al. (2017). These tests allow the user to choose features for comparison and discover aspects distinguishing two models. Second, a convergence property of the KSD is established. Specifically, the KSD is shown to control an integral probability metric defined by a class of polynomially growing continuous functions. In particular, this development allows us to evaluate both unnormalized statistical models and sample approximations to posterior distributions in terms of moments
A Kernel Stein Test of Goodness of Fit for Sequential Models
We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The
proposed measure is an instance of the kernel
Stein discrepancy (KSD), which has been used
to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein
operator: current operators used in testing apply to fixed-dimensional spaces. As our main
contribution, we extend the KSD to the variabledimension setting by identifying appropriate Stein
operators, and propose a novel KSD goodness-offit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class
of models. Our test is shown to perform well in
practice on discrete sequential data benchmarks
Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation
Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: in the first stage, we model relations among the treatment and proxies; in the second stage, we use this model to learn the effect of treatment on the outcome, given the context provided by the proxies. PCL guarantees recovery of the true causal effect, subject to identifiability conditions. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance
A kernel Stein test of goodness of fit for sequential models
We propose a goodness-of-fit measure for probability densities modeling
observations with varying dimensionality, such as text documents of differing
lengths or variable-length sequences. The proposed measure is an instance of
the kernel Stein discrepancy (KSD), which has been used to construct
goodness-of-fit tests for unnormalized densities. The KSD is defined by its
Stein operator: current operators used in testing apply to fixed-dimensional
spaces. As our main contribution, we extend the KSD to the variable-dimension
setting by identifying appropriate Stein operators, and propose a novel KSD
goodness-of-fit test. As with the previous variants, the proposed KSD does not
require the density to be normalized, allowing the evaluation of a large class
of models. Our test is shown to perform well in practice on discrete sequential
data benchmarks.Comment: 18 pages. Accepted to ICML 202
Testing Goodness of Fit of Conditional Density Models with Kernels
We propose two nonparametric statistical tests of goodness of fit for
conditional distributions: given a conditional probability density function
and a joint sample, decide whether the sample is drawn from
for some density . Our tests, formulated with a Stein
operator, can be applied to any differentiable conditional density model, and
require no knowledge of the normalizing constant. We show that 1) our tests are
consistent against any fixed alternative conditional model; 2) the statistics
can be estimated easily, requiring no density estimation as an intermediate
step; and 3) our second test offers an interpretable test result providing
insight on where the conditional model does not fit well in the domain of the
covariate. We demonstrate the interpretability of our test on a task of
modeling the distribution of New York City's taxi drop-off location given a
pick-up point. To our knowledge, our work is the first to propose such
conditional goodness-of-fit tests that simultaneously have all these desirable
properties.Comment: In UAI 2020. http://auai.org/uai2020/accepted.ph
Informative Features for Model Comparison
Given two candidate models, and a set of target observations, we address the
problem of measuring the relative goodness of fit of the two models. We propose
two new statistical tests which are nonparametric, computationally efficient
(runtime complexity is linear in the sample size), and interpretable. As a
unique advantage, our tests can produce a set of examples (informative
features) indicating the regions in the data domain where one model fits
significantly better than the other. In a real-world problem of comparing GAN
models, the test power of our new test matches that of the state-of-the-art
test of relative goodness of fit, while being one order of magnitude faster.Comment: Accepted to NIPS 201
A Kernel Stein Test for Comparing Latent Variable Models
We propose a kernel-based nonparametric test of relative goodness of fit,
where the goal is to compare two models, both of which may have unobserved
latent variables, such that the marginal distribution of the observed variables
is intractable. The proposed test generalises the recently proposed kernel
Stein discrepancy (KSD) tests (Liu et al., 2016, Chwialkowski et al., 2016,
Yang et al., 2018) to the case of latent variable models, a much more general
class than the fully observed models treated previously. As our main
theoretical contribution, we prove that the new test, with a properly
calibrated threshold, has a well-controlled type-I error. In the case of models
with low-dimensional latent structure and high-dimensional observations, our
test significantly outperforms the relative Maximum Mean Discrepancy test,
which cannot exploit the latent structure.Comment: update test statistic (MCMC version
Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation
Proxy causal learning (PCL) is a method for estimating the causal effect of
treatments on outcomes in the presence of unobserved confounding, using proxies
(structured side information) for the confounder. This is achieved via
two-stage regression: in the first stage, we model relations among the
treatment and proxies; in the second stage, we use this model to learn the
effect of treatment on the outcome, given the context provided by the proxies.
PCL guarantees recovery of the true causal effect, subject to identifiability
conditions. We propose a novel method for PCL, the deep feature proxy variable
method (DFPV), to address the case where the proxies, treatments, and outcomes
are high-dimensional and have nonlinear complex relationships, as represented
by deep neural network features. We show that DFPV outperforms recent
state-of-the-art PCL methods on challenging synthetic benchmarks, including
settings involving high dimensional image data. Furthermore, we show that PCL
can be applied to off-policy evaluation for the confounded bandit problem, in
which DFPV also exhibits competitive performance